Speaker and Speech recognition by Audio-Visual lip biometrics
نویسندگان
چکیده
This paper proposes a new robust bi-modal audio visual speech and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of speech and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for speech recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.
منابع مشابه
Lip Biometrics for Digit Recognition
This paper presents a speaker-independent audio-visual digit recognition system that utilizes speech and visual lip signals. The extracted visual features are based on line-motion estimation obtained from video sequences with low resolution (128 ×128 pixels) to increase the robustness of audio recognition. The core experiments investigate lip motion biometrics as stand-alone as well as merged m...
متن کاملAn Automated System for Visual Biometrics
Biometrics has been a topic of great interest since the advent of the information age and will soon lead to a safer and simpler lifestyle where passcodes and keys are inherent to the user. We describe a system capable of automatically extracting visual features from a human face for use in dynamic visual biometrics. Automatic speech and speaker recognition has recently moved towards incorporati...
متن کاملDetecting audio-visual synchrony using deep neural networks
In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...
متن کامل3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...
متن کاملSpeaker independent audio-visual database for bimodal ASR
This paper describes the audio-visual database collected at AT&T Labs{Research for the study of bimodal speech recognition. To date, this database consists of two multiple speaker parts, namely isolated confusable words and connected letters, thus allowing the study of some popular and relatively simple speaker independent audio-visual recognition tasks. In addition, a single speaker connected ...
متن کامل